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During last decade, medical imaging has attracted great deal of research 
interests. Deep learning applications has revolutionized medical image 
analysis and diseases diagnosis. Convolutional neural networks (CNNs)-a 
class of deep learning-have been widely used for classification and feature 
extraction, and they revealed good performance for various imaging 
applications. However, despite the advances in medicine, malaria remains 
among the world’s deadliest diseases. Only in 2020, malaria recorded 241 
million clinical episodes, and 627,000 deaths. The disease is examined 
visually through a microscope, which depends on the pathologists 
experience and skills and results may vary in different laboratories. This 
paper proposes an efficient CNN architecture that could be used in 
diagnosing of malaria disease. By processing on 27,558 red blood smear cell 
images with balanced samples of parasitized and unparasitized cells on a 
publicly available malaria dataset from the National Institute of Health, the 


proposed model achieves high accuracy rate with 99.8%, 98.2, and 97.7% 
for training, validation and testing sets. Furthermore, the statistical results 
approve that the proposed model is outperforming the state-of-the-art 
models. 
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1. INTRODUCTION 

Malaria is one of the most global health issue, as stated by World Health Organization (WHO) in 
2020 around 241 million cases of malaria were recorded worldwide with 627,000 deaths. Most of malaria 
cases and deaths were in Africa Region, which was the home to 95% of total cases and 96% of malaria 
deaths. The WHO’s report showed that children aged under 5 years were the highest susceptible group 
affected by malaria with roughly 67% of all deaths worldwide. However, malaria poses a threat to more than 
half of the world’s population. In addition to Africa, other WHO regions of Eastern Mediterranean, 
South-East Asia, Western Pacific, and the Americas are also at malaria risk. Malaria is caused by 
Plasmodium parasites that infect people through the bites of infected female Anopheles mosquitoes. The 
greatest threat to humans comes from two types of parasite species P. falciparum and P. vivax. Statistics 
showed that P. falciparum is responsible for 99.7% of estimated malaria cases in the WHO African region, 
71% of cases in the Eastern Mediterranean, 65% in the Western Pacific and 50% of cases in the WHO 
South-East Asia region. While P. vivax is the general parasite in the WHO region of the Americas, which is 
accounting for 75% of malaria cases [1]. WHO to reduce malaria cases and death rate of at least 40% by 
2020, 75% by 2025 and 90% by 2030 from a 2015 baseline adopt a global technical strategy. However, the 
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strategy needs an estimated annual resources starting at US$ 4.1 billion in 2016, growing to US$ 6.8 billion 
in 2020. Moreover, around US$ 0.72 billion is required yearly for malaria research and development [1]. 

Due to its nature, malaria symptoms usually appear within two weeks after the infective mosquito 
bite. Some of malaria first symptoms are fever, headache, and chills, but such symptoms could be clement 
and therefore difficult to recognize as malaria [1]. Malaria patients need fast diagnosis, it could mutate into 
acute disease if not treated within 24 hours. Malaria is typically spread across poverty and instability. Malaria 
diagnosis is a time-consuming process and depends on the pathologists skills, which takes a considerable 
effort. Besides that, working in a limited resource environment without helpful systems may affect the 
diagnostic quality and lead to wrong assessment. Diagnosis of malaria should be reliable and high sensitive [2]. 
Detection of malaria at early stages can reduce the severity of the disease, particularly in endemic regions 
where pathologists are limited and the workload of examining blood smear films is intensive. However, 
automation of malaria diagnosis methods is cost-effective, and enhances the diagnosis accuracy, assists 
pathologists in examining process, and of course speeds up the diagnosis process [3]. 

As systems based on artificial intelligence grow quickly, diagnosis of several diseases became more 
efficient. One of the newest techniques of artificial intelligence is deep learning. Deep learning is a part of 
biologically inspired machine learning techniques that were developed to emulate the human brain ability of 
information processing and decision-making [4]. Deep learning techniques can be used to build automated 
diseases diagnosis systems with appreciable detection rates, which will aid pathologists in making correct 
diagnosis. Literature studies proof that deep learning models have ability to learn knowledge like human 
beings through experience. In traditional neural networks, input and output layers are linked by a few hidden 
layers, whereas deep neural networks have more number of hidden layers. In deep learning, the hidden layers 
extract useful features automatically by analyzing the data, and thus eliminating the computation of hand-crafted 
features [5]. Convolutional neural networks (CNNs) is a modern class of deep learning techniques that widely 
used in diagnosis of diseases. With shared weights architecture and translation invariance characterizations, 
CNNs are effective for images classification and have revealed good performance for various imaging 
applications [6]. There are several CNNs state of the arts architectures such as AlexNet [7], GoogleNet [8], 
ResNet [9], and VGG16 [10]. In this paper, we propose an effeicint CNN model for malaria detection by the 
classification of parasitized and unparasitized red blood cell images. 

In accordance with the fact that CNN proved its efficiency in the tasks relating to image processing 
and computer vision like classification, feature extraction, and analysis of medical imags. CNN presents an 
efficient approach to gather images information and features with using of filters, this information can then be 
passed to the machine learning algorithms or feedforward neural networks to carry out the given tasks [11]. 
Several machine learning methods have been proposed to diagnose malaria. Dong et al. [12] evaluated three 
well-known pretrained CNN architecture such as LeNet, AlexNet, and GoogleNet. Dong created an image 
dataset of malaria infected human red blood cells with around 2,565 images: 1,034 infected cells and 1,531 
non-infected cells. Results of the evaluation approved that all these deep CNN architectures achieved 
classification accuracy of 95%. Tasdemir and Qanbar in [13] proposed a deep learning approach based on 
residual attention network (RAN) to classify and diagnosis malaria. RAN is built by combining multiple 
attention modules, where each module is split into two branches mask and trunk. The evaluation result of this 
approach was done on the same data set on our study and showed 95.79% classification accuracy rate. 
Vijayalakshmi and Rajesh [14] proposed a transfer learning approach to recognize parasitized malarial blood 
cell images by unifying visual geometry group (VGG) network and support vector machine (SVM). A dataset 
that consists of 1,530 images was used in the evaluation of this approach, resulting in classification accuracy 
of 93.1%. Irmak [15] have proposed a deep learning approach to detect malaria disease. The author designed 
CNN architecture, which consists of 20 weighted layers to identify parasitized microscopic images from 
unparasitized microscopic images. The proposed approach achieved 95.28% accuracy. Oyewola et al. [16] 
proposed a data augmentation convolutional neural network model. The authors used data augmentations 
techniques such as random rotation, random translation, and horizontal and vertical scale to build the model. 
The model was trained by reinforcement learning and achieved 94.79% accuracy. Magotra and Rohil [11] 
proposed a custom lightweight CNN architecture to diagonsis malaria cells. The model was trained in several 
configurations depending on the ratio of data being passed to the model. The model achieved an accuracy of 
around 96%. The researchers in [11], [15], [16] and evaluated theirs CNN models using the same data set on 
our study. 


2. METHOD 
2.1. Dataset 

The proposed architecture is evaluated by using a dataset that is publicly available online for on the 
website of the National Library of Medicine (NLM) [17]. The dataset consists of 27,558 segmented red blood 
cell images with equal samples of parasitized and unparasitized cells. The dataset collected from 201 patients, 


A deep learning based architecture for malaria parasite detection (Yousef Alraba nah) 


294 o ISSN: 2302-9285 


151 of them were infected and 50 patients were not. The dataset contains parasitized and unparasitized red 
blood cells images, which have variations in colors and shapes considering different bloodstains samples 
during the data acquisition process. Samples of parasitized and unparasitized red blood cell images are shown 
in Figure 1. 


Parasitized Unparasitized 


Figure 1. Parasitized and unparasitized cells 


2.2. Convolutional neural networks 

CNNs is the most widely used deep learning technique that exploits neural networks architecture of 
many hidden layers. CNNs showed significant performance results in image net large scale visual recognition 
competition (ILSVRC), and considered as an effective technique for learning the spatial information from 
2-dimensional images and videos [18]. Each CNN layer semantically extracts patterns and grasps higher 
level representation of the image components. The layer also forwards its weight with the subsequent layer, 
which has progressively invariant features. For example, the initial layers learn simple image patterns like 
edges or colors, while the upper layers learn patterns that are more complex [19]. Three main types of layers 
made up any CCN model: convolutional layer, pooling layer, and fully connected layer. The convolutional 
layer is the top layer and is the core building block of the CNNs. The convolutional layer can be stacked with 
additional convolutional layers or pooling layers. The last layer is the fully connected layer; it carries out the 
classification task based on the extracted features from preceding layers and their filters. The pooling layer 
performs downsampling operations to reduce the input parameters and dimensionality. With each layer, as 
the input image proceeds through the CNN, the CNN identifies greater portions of the image. It continually 
recognizes more and more portions until it finally identifies the image [20]. 

In convolutional layers, an operation is called convolution, which provides the name of the network, 
is considered as the central operation. The convolution is a linear process that carries out the multiplication of 
an input data array and a 2-D array of weights called a filter or kernel. The result of the convolution is a 
feature map, which outlines the detected features of the input [21]. Comparing to the input data, the filter is 
smaller and convolves systematically around each overlapping filter-sized patch of the input in horizontal and 
vertical directions. The filter slides over the input data and performs dot products between its entries and the 
input entries. The idea of convolving the same filter on the entire input data is powerful; it provides the filter 
with the ability to discover the intended feature anywhere in the input, which is generally indicated as 
translation invariance [22]. Translation invariance concerns with presenting of some feature rather than 
where it was present. However, volume size of the convolution operation output is affected by 
hyperparameters such as number of filters, strides and padding [23]. The depth of the output is influenced by 
the number of filters. For instance, using three filters would result in three depth feature maps. Stride 
specifies the number of array elements that filter will move over the input data. Stride values of two or less 
are commonly preferred, while large stride values are rare to be used, and produce smaller output. After 
convolution, padding is utilised to maintain the spatial dimensions of the input data. In some cases, the filters 
do not fit the input, padding reverts to zero any element outside of the input, which produces equally sized 
output [23]. After each convolutional, CNNs apply a nonlinear activation function named rectified linear unit 
(ReLU) to compute the feature maps. ReLU function removes all negative values from feature maps and 
replaces it by zeros, while any positive value remains as is [24]. 

A pooling layer is used as a down sampling layer in between two successive convolutional layers. 
The pooling layer carries out a spatial dimensionality reduction operation to minimize the size of feature 
maps, therefore reduces the number of parameters and computations and aids efficiency [25]. The pooling 
layer provides an input for the next layer and allows it to focus on larger areas of the input representation. 
Like convolutional layers, pooling layers involve sliding a window of a specific size over the feature maps 
and applying some subsampling functions which result in the production a medium level features [25]. While 
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a quite input data may lose in the pooling layer, it helps in limiting the chance of overfitting and reduces the 
complexity. Commonly pooling functions are max pooling and average. Max pooling is frequently used as it 
works better [25], [26]. 

Fully connected layers look like a traditional neural networks, where neurons have full connections 
to all neurons in the preceding layer. Fully connected layers in CNNs are practically consist of a series of 
perceptron layers (usually two or three layers) [27]. Fully connected layer picks the results of its preceding 
layer that represents the learned features and classifies it into the target class. The last fully connected layer 
has output size equals to the class labels number. Convolutional or pooling layers might make up the former 
layer of the the fully connected layer where its output is flattened before feeding it into the fully connected 
layer. Fully connected layers typically employ softmax or sigmoid activation function to predict the outputs 
properly [28]. 


2.3. The proposed CNN architecture 

This section introduces the proposed CNNs model architecture for detection of malaria desease from 
microscopic red blood smear images. Figure 2 shows the proposed architecture. The proposed architecture 
consists of five successive blocks with several convolutional, pooling and fully connected layers. The first 
block consists of two convolutional layers and one pooling layer, the first layer takes input image of size 
60x60x3, represent width, height and depth respectively. The convolutional layers use 64 filters of size 3 x3 
with stride value of two and padding adjusted to same to convolve over the input images. Same padding 
allows the filter window to slip outside the input to ensure applying filter on all input values. The output of 
second convolutional layer feeds into a pooling layer. As shown in the figure, the pooling layer leverages 
max pooling function to produce a less-size input representation, where pool size is 2x2. This process 
continues in second, third and fourth block with 128, 256, and 512 filters respectively with same filter size, 
stride, padding and pooling size values. The convolutional layers in all blocks of the architecture use ReLU 
activation function. 
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Figure 2. The proposed CNNs architecture 


The result of the last pooling layer is passed to a flatten layer with 4,608 output neurons. The flatten 
layer is followed by a dense layers of 128 output neurons followed by another dense layer. The last dense 
layer is responsible for input classification. As the input data is a binary classification problem, sigmoid 
activation function is used in the dense layers to gain the output. Moreover, binary cross entropy is used as a 
loss function, it is used in binary classification problems to compute the error between the actual output and 
predicted output probabilities. In order to minimize the loss function, Adam adapter is used as an adaptive 
learning rate for optimizing the weight parameters. 


3. RESULTS AND DISCUSSION 

This section introduces the performance evaluation of the proposed architecture for malaria 
detection. The dataset of total 27,558 cell images is divided randomly into two subsets, 80% for training set 
and 20% for testing set. Furthermore, the training set is divided into sets of 90% and 10% for training and 
validation respectively. Table 1 shows the partitioning distribution of the dataset for each class. 


Table 1. Dataset partitions 
Dataset type Parasite cells Unparasitized cells 


Training 9,921 9,921 
Validation 1,102 1,102 
Testing 2,756 2,756 
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All experiments are run using Kaggle with GPU runtime using 13 GB RAM and 73 GB hard disk. 
The model is implemented using Python programming language and Keras deep learning library with 
TensorFlow. The input image of the proposed architecture should be of size 60x60, so all images are resized 
to 60x60 pixels and normalized to provide faster convergence. Values of the images pixels are normalized to 
be in the range 0 to 1. As the red blood cell images are RGB color, the maximum pixel value is 255, while 
the minimum is 0, the scaling is done using min max scaler by dividing pixel value on 255. The batch size 
was set to 128, which makes 151 steps for each training epoch. The model is initialized with random weights 
using the TensorFlow library and training using 30 epochs. 

In order to evaluate the performance, a confusion matrix is used. It is a very common measure that 
represents a summary of the predication results on a set of test data [29]. Four basic terms are related to 
confusion matrix, which are true positive (TP), true negative (TN), false positive (FP), and false negative 
(FN) [26]. The TP describes the parasitized cases that are correctly diagnosed as parasitized, while the TN 
represents the unparasitized cells that are correctly diagnosed as unparasitized. On the other hand, the number 
of unparasitized cells that are incorrectly diagnosed as parasitized is denoted by FP, and the number of 
parasitized cells that are incorrectly diagnosed as unparasitized is considered FN. Several indicators can be 
derived from confusion matrix and used as performance measures such as accuracy, sensitivity, specificity 
and precision [30], [31]. Accuracy shows the ratio of the observations that are correctly predicted to the total 
observations, the accuracy of the model increases if it predicates the parasitized and unparasitized images 
correctly. The accuracy is computed as in the following as (1): 


TP+TN 
(TP+TN+FP+FN) 


accuracy = (1) 
Sensitivity, as shown in (2), denotes to the ability of the model to correctly predict the true positives 


observations among all positive observations. Sensitivity is also called recall or true positive rate. 


EEEE i 
sensitivity = AEN (2) 
Specificity shows the ability of the model to predict true negatives among all negative observations. 
We sometimes refer it as the true negative rate. The specificity is determined as (3): 


ogee TN 
Specificity= RETI (3) 


Precision represents the ratio of the observations that are correctly predicted positive to the total predicted 
positive observations. In (4) shows the precision: 


(4) 


Precision= —— 
(TP+FP) 
All metrics are computed for the proposed CNN model and the results are shown in Figures 3 to 5. Figure 3 
shows the training accuracy and validation accuracy for the model. The training accuracy started from 93.5% 
in the first epoch, it increased to 96% in the second epoch and it remains in the range of 97—100% after the 
15" epoch. The validation accuracy for the model in its first epoch is 94%, and in the second epoch is 93.5% 
and remains in the range of 95—99% for the rest of the epochs. 


— train acc 
=-=- val acc 


accuracy 


0 5 10 15 20 25 30 
epochs 


Figure 3. Training and validation accuracy 
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In Figure 4, the training loss started from 48% in the first epoch, it reduced to 30% in the second 
epoch, and kept going down to 10% in the last epoch. For the validation loss, it started from 21% and 
remains in the range of 10-19% for the rest of epochs. The average accuracy, sensitivity, specificity and 
precision scores for our customized CNN model are shown Figure 5. The model achieves: accuracy 
(99.8%, 98.2%, and 97.7%), sensitivity (99.4%, 98.0%, and 97.4%), specificity (99.5%, 98.1%, and 97.5%), 
and precision (99.7%, 98.3%, 97.6%) for training, validation and testing sets respectively. Table 2 shows a 
comparative analysis of the proposed model with other deep learning models by different researchers. The 
table shows how the proposed custom CNN model is outperforming other models. 
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Figure 4. Training and validation loss 
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Figure 5. The proposed model accuracy, sensitivity, specificity, and precision 


Table 2. The proposed method comparison with state-of-the-art methods 


Model Accuracy (%) Number of images Method Publication year 
Dong et al. [12] 95.00 2,565 Pretrained CNNs 2017 
Tasdemir and Qanbar [13] 95.79 27,558 RAN 2019 
Vijayalakshmi and Rajesh [14] 93.10 1,530 Pretrained CNNs 2020 
Irmak [15] 95.28 27,558 Custom CNNs 2021 
Oyewola et al. [16] 94.79 27,558 Custom CNNs 2022 
Magotra and Rohil [11] 96.00 27,558 Custom CNNs 2022 
Proposed model 97.70 27,558 Custom CNNs 2023 
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4. CONCLUSION 

Malaria is one of the challenges afflicting the world and causes a high mortality rate worldwide, 
especially in developing regions such as Africa and Asia. Early diagnosis of malaria provides an opportunity 
to treating it. However, recent improvments in deep learning help in the early detection of such diseases. In 
this paper, an efficient deep custom CNN architecture for detection of malaria from red blood cell images has 
been proposed. The CNN model was trained and tested using a dataset that consists of 27,558 images with 
two outputs: parasitized and unparasitized. Based on the results, the proposed model satisfies 97.7% accuracy 
rate. The results show the effectiveness of the proposed CNN model compared with other existing methods. 
For future works, the model can further be refined to improve the classification accuracy. In addition, the 
proposed architecture can further be investigated in diagnosing other diseases. 
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